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Abstract— The XOR-satisflability (XORSAT) problem deals 
with a system of n Boolean variables and m clauses. Each clause 
is a linear Boolean equation (XOR) of a subset of the variables. 
A A-clause is a clause involving K distinct variables. In the 
random A'-XORSAT problem a formula is created by choosing 
m A'-clauses uniformly at random from the set of all possible 
clauses on n variables. The set of solutions of a random formula 
exhibits various geometrical transitions as the ratio — varies. 

We consider a coupled A'-XORSAT ensemble, consisting of 
a chain of random XORSAT models that are spatially coupled 
across a finite window along the chain direction. We observe 
that the threshold saturation phenomenon takes place for this 
ensemble and we characterize various properties of the space of 
solutions of such coupled formulae. 



I. Introduction 

Spatial coupling is a technique that starts with a graphi- 
cal model and a "hard" computational task (e.g., decoding 
or more generally inference) and creates from this a new 
graphical model for the same task that has "locally" the same 
structure but is computationally "easy". Kudekar, Richardson 
and Urbanke [Tl, fT\ made the basic observation (in the con- 
text of coding theory) that on spatially-coupled graphs, low- 
complexity (message passing) algorithms suffice to achieve 
optimal performance. Despite its very recent introduction, 
spatial coupling has already had significant impact on coding, 
communications, and compressive sensing (see for example 
Il3]-f9l) and has lead to new insights in computer science and 
statistical physics (see IlJl)- 

We consider the effect of spatial coupling on random XOR- 
SAT formulae. The XORSAT problem is the simplest instance 
among the class of constraint satisfaction problems (CSP). 
CSPs arise in many branches of science, e.g., in statistical 
physics (spin glasses), information theory (LDPC codes), and 
in combinatorial optimization (satisfiability, coloring). These 
CSPs are believed to share a number of common structural 
properties, but some models are inherently more difficult to 
investigate than others. It is therefore natural to start with 
relatively "simple" CSPs if one wants to leam more about 
the general behavior of this class of models. 

It is relatively simple to capture the same basic properties 
in the XORSAT problem due to its direct connection with 
linear algebra. Among such properties, an important one 
is the geometry of the space of solutions, which as was 
already understood a decade ago displays very interesting 
phase ti-ansitions |T4l, fl5\. Recently in fT6l, {VT\, a fairly 
complete characterization of this geometry has been provided 
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as a function of the ratio of number of clauses to number 
of variables. In particular, it is shown that for some range 
of values of this parameter, the space of solutions breaks into 
many disconnected "clusters". It is widely believed that such a 
cluster structure is closely connected to the failure of standard 
message passing algorithms to find solutions (e.g., the belief 
propagation algorithm). In other words, it is believed that there 
is a strong connection between the "hardness" of the problem 
and the geometry of the solution space. Therefore we call this 
regime the hard-SAT regime. 

Consider now what happens when we spatially couple such 
formulae. As we will show in the following, a remarkable 
phenomenon called threshold saturation takes place: the belief 
propagation algorithm succeeds in solving the problem in the 
hard-SAT regime of the original (non-coupled) model. This 
immediately raises the question how the space of solutions 
changes under spatial coupling. In other words, what happens 
to the clusters? A naive guess is that these clusters become 
connected. As we will see, the answer is-yes! 

Our main objective is to provide an explanatory picture of 
how the geometry of the solution space is altered under spatial 
coupling. This picture can be helpful in further understanding 
the mechanism of spatial coupling, as well as in gaining some 
intuition about the solution space of other coupled CSPs, or 
in designing efficient algorithms for solving them [12|. 

The outline of this paper is as follows. In Section II-AI 
we introduce in detail the XORSAT problem and random 

-XORSAT ensembles. We also explain in brief the related 
results on the geometry of the solution space of these random 
formulae. In Section HI] we introduce the coupled X-XORSAT 
ensemble. Using the results of 1 13 1 and 1 12| we then prove the 
threshold saturation phenomenon for this ensemble. Finally, 
we discuss the geometry of the space of solutions of this 
ensemble by a direct use of the techniques in [16|. 

A. The K-XORSAT Ensemble: Basic Setting 

An XORSAT formula consists of n Boolean variables Xi € 
{0, 1}, i G {1, • • • ,n}, and a set of m exclusive OR (XOR) 
constraints c G {1, • ' ' Each constraint, c, called from 

now on a clause, is a linear equation consisting of the XOR 
of some variables being equal to a Boolean value be G {0, 1}. 
The number of variables involved in a clause is called the 
length of the clause. Further, a clause of length K is typically 
called a X-clause. Furthermore, A X-XORSAT formula is a 
formula consisting only of A'-clauses. In matrix form, a K- 
XORSAT formula can be represented as linear system 



Mx = b. 



(1) 



2 



Here, the matrix H is an m x n matrix with entries Hc^i £ 
{0,1}, and Hc^i is equal to 1 if and only clause c contains 
the variable Xi. The vector x is an m component vector 
representing the variables and the vector b is also an m 
component vector representing the clause values be- 
lt is convenient to represent a XORSAT formula via a 
bipartite graph G = {V U C, E), where we denote the set 
of variable nodes by V and the set of clause nodes by C. We 
thus have \V\ — n and \C\ — m. There is an edge between a 
clause c e y and a variable i ^ V if and only if c contains 
Xi. The set of edges of G is denoted by E. 

Let us now explain the ensemble of random X-XORSAT 
formulae. Let m = [an\, where a is a positive real number 
and is called the clause density. To choose an instance from 
the i^-XORSAT ensemble, we proceed as follows. There are 
m clauses of length K and n variables. Each clause picks 
uniformly at random a subset of length K of the variables 
and flips a fair coin to decide the value of be. All the above 
steps are taken independent of each other In other words, 
the random if-XORSAT ensemble is defined by taking b 
uniformly at random in {0, 1}'" and H uniformly at random 
from the set of all the m x n matrices with entries in {0, 1} 
that have exactly K ones per row. 

One objective of the XORSAT problem is to specify whether 
a given formula has a solution or not. Standard linear algebraic 
methods allow us to accomplish this task with complexity 
0{n^). Here, we discus a linear complexity algorithm for 
solving XORSAT formulae called the peeling algorithm. In 
our case, this algorithm is known to be equivalent to the belief 
propagation{BP) algorithm. 

B. The Peeling Algorithm 

We begin by a brief explanation of the algorithm. Let G be 
an XORSAT formula. As mentioned previously, we can think 
of G as a bipartite graph. The algorithm starts with G and in 
each step shortens G until we either reach the empty graph 
or we can not make any further shortening. Assume now that 
there exists a variable i in G with degree or 1. In the former 
case, the value of the variable can be chosen freely. Also, in 
the latter case, assuming c is the check node connected to i, 
it is easy to see that the value of Xi can be determined after 
the values of the other variables connected to c are specified. 
Hence, without loss of generality, we can remove i and its 
neighboring clause (if any) from G and search for a solution 
for the graph G\i. In other words, finding a solution for G is 
equivalent to finding a solution for G \ i. As a result, we can 
peel the variable i from G and do the same procedure on G\i. 
We continue this process until the residual graph is empty or 
it has no more variables with degree at most 1. The final graph 
that we reach to by the peeling procedure is called the 2-core 
or the maximal stopping set of G. We recall that a stopping 
set of G is a subgraph of G containing a set of clauses and 
a set of variables where each clause has degree K and all 
the variables have degree at least 2. The 2-core is a stopping 
set of maximum size. The peeling algorithm determines the 
2-core of a graph G. If the 2-core is empty then the algorithm 
succeeds and it is easy to see that the solution can be expHcitly 
found by backtracking. 



The peeling algorithm has an equivalent message passing 
(MP) formulation. It can be shown that the message passing 
rules for the peeling algorithm are also equivalent to the BP 
update rules. Further, if the formula G comes from the K- 
XORSAT ensemble, then one can analyze the behavior of the 
peeling algorithm in a probabilistic framework called density 
evolution (DE). The DE equations can be cast into a simple 
scalar recursion llT9l 

= 1 - cxp{-aX(x*)^-i}, (2) 

with x'^ = 1. Here, is related to the fraction of edges present 
in the remaining graph at time t. For the peeling algorithm to 
succeed, the value of a:* should tend to as < increases. This 
is possible if and only if the equation 

X = 1- exp{-aKx'^-^}, (3) 

has a unique solution which is the trivial fixed point x — 0. 
The net result is that the peeling algorithm succeeds with high 
probability (w.h.p) for a < ad{K) defined as 

ad{K) = 

K 

sup{a > s.t. Vx G (0,1] : a; > a(l-exp(-y a;))^'"^}. 

For a > ad{K) the peeling algorithm is w.h.p stuck in 
the 2-core of the graph. It can be shown |16 | that the 2- 
core consists w.h.p of nV{a, K){1 + o(l)) variables and 
nG{a,K){l + o(l)) clauses, where V{a,K) and C{a,K) 
are given as follows. Let x be the largest solution of ([3]l and 
X = x^~^, we have 

V{a, K) = l-{1+ Kax) exp{-aKx), (4) 
G{a, K) = ax{l — cxp{—aKx)). (5) 

C. Phase Transitions and the Space of Solutions 

For a random i^-XORSAT formula with a < ad{K), the 
peeling algorithm succeeds w.h.p and hence the formula has 
a solution. What happens for a > ad{K)l It is easy to 
see that for a > 1 the formula has w.h.p no solution. In 
fact, there exists a critical density as{K) such that when 
the clause density crosses as{K), the iiT-XORSAT ensemble 
undergoes a phase transition from almost certain solvability to 
almost certain unsatisfiability. The value as{K) is called the 
SAT/UNSAT threshold and is given as 

as{K) ^ sup{a > s.t. V{K,a) > G{K,a)}. (6) 

The value of ad separates two phases. For a < ad the graph 
has no 2-core whereas for a G {ad, as) the graph has a large 
2-core and no algorithm is known to find a solution in linear 
time. These two phases differ also in the structure of their 
solution space as we explain now. We assume without loss 
of generality that the vector b is the all-zero vector Note here 
that a non-zero b affects the solution space of the homogeneous 
system only by a shift and hence does not alter its structure. 

The solutions of a formula are members of the Hamming 
cube {0,1}". For x,y G {0,1}" we let d{x,y) denote their 
Hamming distance. For a < ad, there exists a constant 
B < oo such that that w.h.p the following holds lfT6l . Let 
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d — (logn)^. Consider two solutions x, x'. Then, there exists 
a sequence of solutions ' such that 

< d. Thus, for a < ad, the space of solutions 
can be imagined as a big cluster in which one can walk from 
one solution to another by a numbers steps that are of size 
at most d (sub-linear in n). For a G {ad, as) the space of 
solutions shatters into an exponential number of clusters. Each 
cluster corresponds to a solution of the 2-core in the following 
sense. Given an assignment x, we denote by tt{x) its projection 
onto the core. In other words, Tr{x) is the vector of those 
entries in x that con^esponds to vertices in the core. Now, for 
a solution of the core, .Xcoic, we define the cluster associated 
to Xcoic as the set of solutions to the whole formula such 
that Tr{x) = Xcoic- Hence, for each solution of the core, there 
exists one cluster in the space of solutions of the formula. 
It can be shown that each two solutions of the core differ in 
Q{n) positions [T9l. Thus, any two solutions belonging to two 
different clusters also differ in 8(n) positions. However, each 
cluster by itself has a connected structure in the sense that for 
any two solutions x,x^ belonging to the cluster, there exists 
a sequence of solutions X - - Xq ^ X_^ ^ ' ' ' ^ x_^ — x_ ' inside that 
cluster such that d{xi,Xi+i) < d. Figure [T] shows a symbolic 
picture of the clustering of solutions in the two phases. 




■>-Q! 



Fig. 1. A symbolic picture of the space of solutions for tlie /f-XORSAT 
ensemble. Below Od the space looks like a big connected cluster whereas in 
the region a S (a^iCts) the solution space breaks into exponentially many 
clusters far away from each other. 



II. The Coupled /v -XORSAT Ensemble 

This ensemble represents a chain of coupled underlying en- 
sembles. Figure 12] is a visual aid but gives only a partial view. 
We consider L — w + 1 clause positions z £ {0, 1, • • • ,L — w} 
and L variable positions z G {0, 1, • • • , L — 1}. At each 
variable position z, we lay down n Boolean variables. Also, 
for each check position z, we lay down m — [anj clauses of 
length K. So in total we have nL variables and ■m{L — w+ 1) 
clauses. Let us now specify how the set of edges, E, is chosen. 
Each clause c at a position z, chooses its K variables via the 
following procedure. We first pick a position z + k with k 
uniformly random in the window {O, - - ,w — 1}, then we 
pick a variable uniformly at random among all the variables 
located at position z + k, and finally we connect the clause 
and the variable. The value of be is also chosen by flipping a 
fair coin. This ensemble is called the (spatially) coupled K- 
XORSAT ensemble and an instance of it is called a coupled 
formula. 



L-w 




L-w L-l 



Fig. 2. A representation of the geometry of the graphs with window 
size w = 3 along the "longitudinal chain direction" z. The "transverse 
direction" is viewed from the top. At each position there is a stack of 
n variable nodes (circles) and a stack m constraint nodes (squares). 
The depicted links between constraint and variable nodes represent 
stacks of edges. 

It is also useful to consider another ensemble of coupled 
graphs where positions are placed on a ring. This ensemble 
is called the ring ensemble and is obtained as follows. We 
consider L clause positions z E {0, 1, • • • , L — 1} and L 
variable positions z e {0, 1, • ■ • , L — 1}. At each variable 
position z, we lay down n Boolean variables. Also, for each 
check position z, we lay down m — [anj clauses of length 
K. So in total we have nL variables and niL clauses. Each 
clause c at position z, chooses its K variables via the following 
procedure. We first pick a position mod (z + k, L) with k 
uniformly random in the window {O, - - ,w — l}, then we pick 
a variable node uniformly at random among all the variables 
located at position z + /c, and finally we connect the clause 
and the variable. The value of he is also chosen by flipping 
a fair coin. It can be easily seen that by picking a random 
ring formula and removing all of its clauses that are placed at 
positions L — w^l,--- ,L — Iwe generate a coupled formula. 

A. Threshold Saturation 

The peeling algorithm can be used for the coupled and 
ring formulae in the same manner as explained above. We 
denote by ad.L.w{K) and ct^^i^wi-^) threshold for the 
emergence w.h.p of a non-empty 2-core for the coupled and 
ring ensembles. We also denote the SAT/UNSAT threshold for 
these ensembles by as^L,w{K) and ol'"^^^{K), respectively. 

Let us first consider the coupled ensemble. A similar mes- 
sage passing analysis as above yields a set of one-dimensional 
coupled recursions 

= 1 - - E exp{-aX(i 4+.-^''-'}. (7) 
w ^ — ' w ^ — ' 

(=0 k=0 

with boundary values x* = for z > L and z < 0. This 
recursion results in the one-dimensional fixed point equations 

-j^ u; — 1 w — l 

X. = 1 - - V cxp{-aK{- y x,+k-i)'^-^} , (8) 
;=o fc=o 

with boundary values x* = for z > L and z < 0. We recall 
that ad.L.iu{K) is the highest clause density for which the 
fixed point equation ^ admits a unique solution that is the 
all-zero solution. 
Lemma 1: We have 

lim lim ad,L,w{K) = lim as,L,w{K) = as{K). (9) 

uj— >oo L— s-oo L—^oo 
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3 


4 


5 


7 


as 


0.917 


0.976 


0.992 


0.999 




0.818 


0.772 


0.701 


0.595 




0.917 


0.977 


0.992 


0.999 



TABLE I 

First line: PHASE TRANSITION THRESHOLD FOR A'-XORS AT. 
Second line: PEELING THRESHOLD FOR THE UNCOUPLED 
ENSEMBLE. Third line: PEELING THRESHOLD FOR THE COUPLED 
ENSEMBLE WITH TO = 5, L = 80. 



Proof: The fact that as,L,to tends to as L grows 
large, follows from the interpolation arguments of [12 1. For 
the other limit, from (|6) it can be shown that as corresponds 
to the potential threshold (defined in |13|) of the scalar 
recursion (|2). Hence, it follows from |13, Theorem 1] that 
limtu^oo limL^oo ad,L,«i tends to a^. ■ 
As a result, as L and w grow large the peeling algorithm suc- 
ceeds at densities very close to as{K). Table J] contains some 
numerical predictions of ctd,L,w{K). For the ring ensemble, 
the fixed point equation for the peeling algorithm become 

^ w—l ^ w—1 

Xz = l Y] exp{-ai^(— V Xniod (z+fe-/,L))^"^}- 

w ^ — ' w ^ — ' 

1=0 k=0 

(10) 

It is easy to see that for a > ad{K), the above set of fixed 
point equations admit a nontrivial solution in the following 
form. For z G {0, 1, • • • , i — 1}, we have Xz = x, where x 
is the largest solution the FP equation in (O. For a < ad, it 
is also clear that there is only one solution which is the all- 
zero solution. Hence, for the ring ensemble we obtain for any 
choice of L and w 

<lw{K)=ad{K). (11) 

By combining ( fTTT i and one observes the following 
remarkable phenomenon. Let L and w be large but finite 
numbers such that L ^ w. For these choices of L,w we 
have from (|9]l that ad.L.w{K) « as{K). Also, let a G 
[ad, ad.L.w] and pick a formula from the ring ensemble. We 
deduce from ( fTTT i that such a formula has a non-trivial 2-core. 
Furthermore, it can be shown that the 2-core has a circular 
structure and for each position z £ {0, — 1}, it has 

nV{a,K){l + o(l)) variables nC{a,K){l + o(l)) clauses. 
Now, assume that from this 2-core we remove all the clauses 
at positions L — w + 1, ■ ■ ■ ,L — 1 (i.e., we open the ring, 
see Figure O and run the peeling algorithm on the remaining 
graph. From (|9]l we deduce that the peeling algorithm succeeds 
on the remaining graph in the sense that it continues all the 
way until it reaches the empty graph. Note here that the ratio 
of the clauses that we remove from the 2-core is ^ which 
vanishes as we choose L ^ w. 

B. The Set of Solutions 

We now focus on the geometrical properties of the space 
of solutions of the coupled and ring formulas. Given the fact 
that for a G {ad,as) a ring formula has a core, we deduce 
that for this region of a the set of solutions of a ring formula 
resembles the set of solutions of an uncoupled formula which 
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Fig. 3. For a 6 (a^,a^ L,w), ^ random formula from the ring ensemble 
has w.h.p a non-trivial 2-core. The top figure is a simple example of a 2-core 
associated to a ring formula with L = 8 and w = 2. When we open the 
2-core by removing the check nodes at positions L — w + 1, ■ ■ ■ ,L — 1 (the 
bottom figure), the remaining graph has w.h.p no 2-core. 

was explained in Section II-CI In other words, the space of 
solutions of a ring formula shatters into exponentially many 
clusters (see Figure [1). Each cluster corresponds to a unique 
solution of the 2-core. Also, each cluster is itself connected 
and the distance between any two different clusters is Q{nL). 
Now, assume L and w are large but finite numbers such that 
L ^ w. For these choices of L, w we have from (|9]l that 
{K) « as(A').Leta G [ad , ctd.,L,w) and pick a formula 
from the coupled ensemble. Let us denote this formula by F 
and its set of solutions by S. This formula w.h.p does not 
have a core. Also, we keep in mind that a coupled formula 
can be obtained from a typical ring formula by removing the 
clauses at the last w positions. We denote such a ring formula 
by and its set of solutions by S^'^s. We know that S'""s 
shatters into exponentially many clusters. It is easy to see that 
S'""s C S'. As a result S contains all the clusters of 
Given these facts, how does the space of the space S look 
like? In particular how are the two spaces S and related? 
We now show that the space 5* is a connected cluster 

Theorem 2: Let a G {ad, ad.L.w)- Consider a random 
coupled AT-XORSAT formula and let S be its set of solutions. 
The set 5* is a connected cluster in the following sense. 
There exists a B — B{a,K) < oo such that for any two 
solutions x,xl_ G S, there exists a sequence of solutions 
£ = iEo'Sii ' ■ ' ^Hr — iL such that d{xi,Xi+i) < (e^ logn)^. 
Proof sketch: The proof of this theorem essentially mimics the 
proof of Theorem 2 in llT6l except for the last part. For the 
sake of briefness, we only give an sketch of the proof. The 
proof goes by showing that the set of solutions of the equation 
Mx = 0, i.e. the kernel of the matrix H, has a sparse basis. In 
other words, there exists vectors yi,2/2, • • • ,yi that span the 
space kcrncl(H), and each of the vectors has a low weight, 
i.e., w{yi) < (e^logn)^ where w{-) denotes the Hamming 
weight. We call such a basis a sparse basis. It is easy to see 
that if such a basis exists for the space of solutions, then the 
result of the theorem holds. 
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We now proceed by explicitly constracting such a basis. We 
first show that if the matrix H has no core, then the peeling 
procedure provides us with a natural choice of a basis for 
kernel(H). We then show that such a basis is indeed sparse. In 
this regard, we consider an slightly modified, but equivalent, 
version of the peeling algorithm called the the synchronous 
peeling algorithm. Given an initial formula (graph) G, this 
algorithm consists of T{G) rounds t = 1,2,- •• ,T{G). The 
residual graph at the end of round t is denoted by Jt. We 
also let Jo = G. We denote the set of clauses, variables 
and edges removed at round t by {Ct,Vt, Et). Hence for 
i > 1 we have Jt-i — Jt LI {Ct,Vt,Et). At each round t, 
the algorithm considers the graph Jt-i and removes all the 
variable nodes that have degree 1 or less together with all the 
clauses (if any) connected to these variables. It is easy to see 
that synchronous peeling is somehow a compressed version 
of the peeling algorithm mentioned in Section II-BI Assuming 
that the initial graph G has no core, the final Jt(g) is empty. 

To ease the analysis, let us re-order the clauses and the 
variables in the following way. We start from the clauses in 
Ci and order theses clauses (in an arbitrary way) from 1 to 
|Ci|. We then consider clauses in C2 and order them (in an 
arbitrary way) from |Ci | + 1 to |Ci | + IC2 1 and so on. We do the 
same procedure for the variable nodes but with the following 
additional ordering. Within each set Vt, the ordering is chosen 
in such a way that nodes that have degree in Jt-i appear 
with a smaller index than the ones that have degree 1. Now, 
with such a re-ordering of the nodes in the graph, the matrix 
H has the following fine structure. For the sets P C C and 
Q C V, we let Hpg be the sub-matrix of H that consists 
of elements of H whose rows are c £ P and columns are 
i G Q. The matrix H can be partitioned into T(G) x T{G) 
block matrices He, where 1 < s,t < T{G) such that for 
s > t, Mc^.Vt is the all-zero matrix and the diagonal blocks 
^Ct,Vt have a staircase structure. Here, by a staircase structure 
we mean that the set of columns of Vt can be partitioned 
into \Ct \ + 1 groups Cq, • • ■ ,C\Cf \ such that the columns in 
Co are all-zero and the columns in Ci have only their i-th 
entry equal to 1 and the rest are equal to 0. Given such a 
decomposition of H, it is now easy to see how one can find 
a basis for its kernel. In fact, the matrix H has essentially an 
upper triangular structure. With this structure, one can apply 
the method of back substitution [16, Lemma 3.4] to solve the 
equation H^. = and find the kernel of H. Here, for the sake of 
briefness we just mention the final result. We partition V into 
a disjoint union V = U L)W in a way that Xyy will be our set 
of independent variables and Xjj will be the set of dependent 
ones (i.e., Xu can be expressed in terms of x^^). The partition 
is then constructed by letting W — W\ U Wi ■ • ■ U Wt{G) 
and J7 = ?7i U C/2 • • • U Ut(g)- For each t, we construct Wt 
by using the staircase structure of Hc^.y^. We recall that the 
columns of Hcj^Vt have the partition Vt — CqUCi ■ ■ ■ L)C^Ct\- 
We then construct Wt as Wt = Co U • • • U C|(^^|, where 
CI is constructed from C,; by removing an arbitrary element 
from it (C- is empty if \Ci\ — 1 ). In other words, among 
the variables in C, we choose one as the dependent variable 
and let the others be independent variables in Wt- We then let 



Ut = Vt\Wt- With the sets W and U explained as above, 
let us reorder the variable in V as U followed by W, i.e., we 



reorder the variables such that we can write x ^ (x 
One can show that the columns of the matrix 

' I 



)■ 



(12) 



form a basis for the set of solutions. Here, the matrix I denotes 
the identity matiix of size \W\ = \V\ — \C\. Also, if Kij = 1 
then we have dG{i,j) < T{G), where by dcii,]) we mean 
the distance between variables i,j in the graph G. 

It is now easy to show that the Hamming weight 
of any column of K is bounded above by the value 
maxigv^ \BG{i:T{G))\, where by BG{i,T{G)) we mean the 
set of variables j such that daihj) < T{G). In the last step, 
we argue that with high probability 



T{G) <vL + Bi log log 



(13) 



where v and Bi are finite constants. From (fTsl l. lfT6l Lemma 
3.11], and the fact the coupled ensemble has the same local 
structure as the un-coupled ensemble, we then deduce that 
w.h.p max.ey |BG(i,r(G))| < e-^^^C^^)) < (e^log(n))^, 
where B and B2 are finite constants. It remains to justify (fT3T l. 
Consider the DE equations (|7]i starting from the initial point 
= 1 for 1 < z < L — 1 and x^ — for z at the boundaries. 
Let S he a (very) small constant. It can be shown from [18] 
that that there exists a constant v = v{a, K,6) < 00 such that 
x'f.^ < 6 for ail z e {0,1, ■ ■ ■ , L - 1}. In other words, the 
effect of the boundary (i.e., x^ ~ for z > L and z < 0) 
propagates towards the positions at the middle in wave-like 
manner and with a speed v and hence at time t = vL all 
the values x* are small. Once the value of x* is sufficiently 
small then it converges to doubly exponentially fast. Hence, 
intuitively, the synchronous peeling algorithm needs w.h.p an 
extra Bi log log n steps to clear out the whole formula and 
the total time taken by peeling will be vL + Bi log log n . Of 
course, this is just an intuitive argument. A formal analysis 
can be followed similar to [Td', Lemma 3.11]. 



C. An Intuitive Picture of the Sparse Basis 

As we mentioned in the end of Section III-AI a ring 
formula with density a G {ad, ad,L,w) has a core. The core 
has a circular structure with roughly nG{a, K) clauses and 
nV{a, K) variables in each position z G {0, 1, • • • , L — 1}. 
Further, each two solutions of the core are different in Q{nL) 
positions. Now, consider the formula obtained by removing the 
clauses at the last w; — 1 positions of the core (i.e., positions 
L — w + 1,--- ,L — 1). We call such a formula the opened core. 
We know that the peeling algorithm succeeds on the opened 
core and from Theorem |2] its solution space is a connected 
cluster and admits a sparse basis. So the distant solutions of 
the original core are now connected to each other via the new 
solutions spanned by this sparse basis. Our objective is now 
to see, at the intuitive level, how its spare basis looks like. 

All the variables in the opened core have degree at least 
two except the ones at the two boundaries (we call the first 
w — \ positions and the last w — 1 positions the boundaries 
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of the chain). Once the synchronous peeling algorithm begins, 
the effect of these low degree variables at the boundaries starts 
to propagate like a wave towards the middle of the chain. The 
algorithm evacuates the positions one-by-one with a constant 
speed V approaching the middle flSl. A simple, albeit not very 
accurate, analogy is a chain of properly placed domino pieces. 
Once we topple a boundary piece the whole chain is toppled 
with roughly a constant speed. 

Consider the peeling algorithm explained in Section II-BI 
This algorithm removes the variables in the graph one-by- 
one. Each variable that is removed in this algorithm has either 
degree or 1. A variable that, at the time of being peeled, 
has degree is called an independent variable. A variable of 
degree 1 is called a dependent variable. One can easily see 
that the definition of an independent (dependent) variable is 
equivalent to the definition given in the proof of Theorem |2] 
In Theorem |2] we proved that the opened core has a sparse 
basis. The number of elements of the basis is equal to the 
number of independent variables explored during the peeling 
algorithm. Furthermore, there is a one-to-one correspondence 
between the independent variables and the elements of the 
sparse basis, as we explain now. 

Consider the synchronous peeling procedure defined in the 
proof of Theorem |2] The synchronous peeling procedure is 
a compressed version of the peeling algorithm in the fol- 
lowing sense. At any step of synchronous peeling, we peel 
all the variables in the remaining graph that have degree 

or 1. Let us now denote the graph of the opened core 
by G* = (C* ,V* , E*). Consider an independent variable 

1 & V* and assume that the variable i is removed at step 
ti of the synchronous peeling algorithm. Let Hg* {"i, U) be the 
set of all the variables u such tha{3 dcihu) < ti and u is 
peeled at some time before i. We also include in Ho'ihti) 
any check node (together with its edges) whose variables 
are all inside T-Lcihti)- Intuitively, Ho'ihti) corresponds 
to the history of the variable i with respect to the peeling 
procedure. Figure 2] illustrates these concepts via a simple 
expample. As we explained above, the (synchronous) peeling 




Fig. 4. Variable i is an independent variable that is peeled off at the third step 
of the synchronous peehng algorithm, i.e., ti = 3. The sub-graph He ti) 
consists of all the variables and checks of the opened core (together with the 
edges between them) that are peeled at some time before i and whose distance 
from i is less than ti. 

procedure on the opened core propagates like a wave from 
the boundaries towards the middle of the core, with a constant 

'We denote by dc [i, u) the graph distance between the variables u and 
i in the opened core. 



speed V. As a result, if the variable z is at a (variable) position 
p € {0, , 1 • • • , L — 1}, then we have|l ti « vp. As a result, 
when n is large and n ^ L, then Hcihti) is w.h.p a tree 
whose leaf nodes are located at one of the boundaries of the 
opened core (see Figure |4). Let us now see how the basis 
vector corresponding to the independent variable i looks like. 
One can think of Hcihti) as a sub-graph or a sub-formula 
of G* . Also, since we are solving the equation Hx = 0, a 
solution of Hc^ihti) can naturally be extended (lifted) to 
a solution of G* by simply assigning to the variables in 
G*\'Hg* [h ti). Consider a solution of T-Lg- {h ti) for which the 
value that the variable i takes is 1. Since the peeling succeeds 
on TLg* (i, ti) and i is an independent variable, such a solution 
exists (one can find such a solution by assigning 1 to i and then 
backtracking on "Hc-ihti))- Such a solution, when extended 
to a solution of G* is the corresponding basis element for the 
variable i. 
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